As
the world becomes ever more data oriented, much greater emphasis is
being placed on getting data from one place to another. To complicate
matters, data can be stored in many different formats, contexts,
filesystems, and locations. In addition, the data often requires
significant transformation and conversion processing as it is being
moved around. Whether you are trying to move data from Excel to SQL
Server, create a data mart (or data warehouse), or distribute data to
heterogeneous databases, you are essentially enabling someone with data.
This section describes the
SSIS environment and how it is addressing these needs. As mentioned
earlier, the focus is on importing, exporting, and transforming data
from one or more data sources to one or more data targets.
Common requirements of SSIS might include the following:
Exporting data out
of SQL Server tables to other applications and environments (for
example, ODBC or OLE DB data sources or via flat files)
Importing
data into SQL Server tables from other applications and environments
(for example, ODBC or OLE DB data sources or via flat files)
Initializing data in some data replication situations, such as initial snapshots
Aggregating data (that is, data transformation) for distribution to/from data marts or data warehouses
Changing the data’s context or format before importing or exporting it (that is, data conversion)
Some typical business scenarios for SSIS might include the following:
Enabling data marts to receive data from a master data warehouse through periodic updates (see Figure 1)
Populating a master data warehouse from legacy systems (see Figure 2)
Initializing heterogeneous replication subscriber tables on Oracle from a SQL Server 2008 Publisher (see Figure 3)
Pulling sales data directly into SQL Server 2008 from an Access or Excel application (see Figure 4)
Exporting static time-reporting data files (that is, flat files) for distribution to remote consultants
Importing new orders directly or indirectly from a sales force automation or distributed sales systems
In general, you need SSIS if any of the following conditions exist:
You need to import
data directly into SQL Server from one or more ODBC data sources, .NET
and OLE DB data providers, or via flat files.
You
need to export data directly out of SQL Server to one or more ODBC data
sources, .NET and OLE DB data providers, or via flat files.
You
need to perform data conversions, data cleansing/data standardization,
transformations, merges, or aggregations on data from one or more data
sources for distribution to one or more data targets. You also need
SSIS if you need to access the data directly via any ODBC data source,
.NET or OLE DB data providers, or via flat files.
Your
bulk data movement doesn’t have to be faster than the speed of light.
Unfortunately, SSIS must utilize conventional connection techniques to
these data sources. It must also create intermediate buffers to hold
data during the transformation steps. This usually disqualifies SSIS on
the high-performance side of requirements (at least for large, bulk
data movements with any type of data transformations defined). However,
many performance enhancements are present in SSIS and the data
providers that are now supported, which has resulted in about a 50%
increase in bulk data movement speeds. Alternative importing/exporting
facilities such as bcp offer better performance but lack the flexibility of SSIS.
The following additional SSIS data sources and destinations are supported:
An XML source for extracting data from XML documents directly
Full insert and updating support for SQL Server Mobile destinations
Reading and writing to Raw data files (sources and destinations)
Creating an in-memory ADO DB recordset via a destination
Direct access to a number of Analysis Services object destinations (for example, mining models, cubes, and dimensions)
The ADO.NET DataReader source and destination for reading and writing to any .NET framework data provider
SQL Server 2008 now supports the following additional SSIS data transformations:
Data warehousing operations, such as the Aggregate, Pivot, Un-pivot, and Slowly Changing Dimension transformations
Enhanced text data mining via the Term Extraction and Term Lookup transformations
Caching for Lookup transformations
Enhancing data values from a lookup table via the Data Lookup and Fuzzy Lookup transformations
The identification of similar data rows via the Fuzzy Grouping transformation
Multiple downstream data flow component data distribution via the Conditional Split and Multicast transformations
The
merging and combining of data rows from multiple upstream data flow
components via the Union All, Merge, and Merge Join transformations
Extensive
copying and modifying of column data values, using the Copy Column,
Data Conversion, and Derived Column transformations
Sample rowset extractions, using the Percentage Sampling and Row Sampling transformations
Sorting of data and identification of duplicate data rows via the Sort transformation
SSIS includes a set of tools
and features that support managing, editing, executing, and migrating
DTS packages from earlier versions of SQL Server. You can see all
available DTS packages
in SSMS (in a separate branch). You can also choose to migrate old DTS
packages (from SQL Server 2000) forward to SSIS packages (to SQL Server
2008) via the Package Migration Wizard. It’s quite easy. If you can’t
migrate your old DTS packages yet, you can directly execute DTS
packages from SSIS packages. If you need to be able to design changes
to existing DTS packages, you can either download the special DTS
designer version for SQL Server 2008 from Microsoft’s website, or just
bite the bullet and migrate them forward. We recommend migration as
rapidly as is feasible.